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Statistical Limits of Convex Relaxations 

Zhaoran Wang* Quanquan Gul Han Liu* 


Abstract 

Many high dimensional sparse learning problems are formulated as nonconvex optimization. 
A popular approach to solve these nonconvex optimization problems is through convex relaxations 
such as linear and semidefinite programming. In this paper, we study the statistical limits of convex 
relaxations. Particularly, we consider two problems: Mean estimation for sparse principal submatrix 
and edge probability estimation for stochastic block model. We exploit the sum-of-squares relaxation 
hierarchy to sharply characterize the limits of a broad class of convex relaxations. Our result shows 
statistical optimality needs to be compromised for achieving computational tractability using convex 
relaxations. Compared with existing results on computational lower bounds for statistical problems, 
which consider general polynomial-time algorithms and rely on computational hardness hypotheses 
on problems like planted clique detection, our theory focuses on a broad class of convex relaxations 
and does not rely on unproven hypotheses. 


1 Introduction 

A broad variety of high dimensional statistical problems are formulated as nonconvex optimization. 
For example, sparse estimation can be formulated as optimization under £o-norm constraints, where 
the .^o-norm is a pseudo-norm defined as the number of nonzero elements in a vector. To solve these 
nonconvex optimization problems, a popular approach is to resort to convex relaxations. Particularly, 
for sparse estimation, significant progress has been made by using £i-norm as a convex relaxation for 
the nonconvex f'o-norm (see, e.g., Biihlmann and van de Geer (2011); Chandrasekaran et al. (2012) 
and the references therein). 

In this paper, we study the statistical limits of convex relaxations. In particular, we focus on the 
sum-of-squares (SoS) hierarchy of convex relaxations (Lasserre, 2001; Parrilo, 2000, 2003), which is 
made up of a sequence of increasingly tighter convex relaxations based on semidefinite programming. 
We study the SoS hierarchy because it attains tighter approximations than other hierarchies such as 
the hierarchies proposed by Sherali and Adams (1990) and Lovasz and Schrijver (1991), as well as 
their extensions (see Laurent (2003) for a comparison). Hence, the estimators in the SoS hierarchy 
achieve superior statistical performance than the estimators within other weaker hierarchies, which 
suggests the statistical limits of the SoS hierarchy are also the limits of weaker hierarchies. 
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To demonstrate the statistical limits of convex relaxations, we focus on the examples of sparse 
principal submatrix estimation and stochastic block model estimation. In detail, for sparse principal 
submatrix estimation, we assume there is a s* x s* submatrix with elevated mean (3* on the diagonal 
of a d X d noisy symmetric matrix. For stochastic block model estimation, we assume there exists a 
dense subgraph with s* nodes planted in an Erdos-Renyi graph with d nodes. We denote by 13* the 
edge probability of the subgraph. For both examples, our goal is to estimate j3* under a challenging 
regime where s* = o[(d/\/logd)^/^] and logd = o(s*). We prove the following information-theoretic 
lower bound 

inf sup Ep|/3 — /3*\> C^/ljs* ■ log(d/s*), (1.1) 

3 V&V{s*,d) 

where (3 denotes any estimator, V{s*,d) is the distribution family to be specified later and C is an 
absolute constant. We prove that a computational intractable estimator (to be specified later) 
attains the lower bound in (1.1). In order to achieve computational tractability, we consider convex 
relaxations of that fall within the SoS and weaker hierarchies, which are denoted by 73. Let C' 
be a positive absolute constant. We prove that under certain conditions, 

inf sup Ep|/3 —/3*I > C'^ (1-2) 

3&n ¥er{s*,d) 

Together with (1.1), (1-2) illustrates the statistical limitations of a broad class of convex relaxations. 
Ignoring the logarithmic factor, (1.1) and (1.2) suggest there exists a gap of between the limits 
for any estimator and the limits for estimators within the hierarchies of convex relaxations. Hence, 
this result shows statistical optimality must be sacrificed for gaining computational tractability with 
convex relaxations. For sparse principal submatrix estimation, we prove that a linear-time estimator 
within 73 attains the lower bound in (1.2) up to a logarithmic factor, and is therefore nearly optimal 
within a general family of convex relaxations. 

Our work is closely related to a recent line of research on computational barriers for statistical 
problems (Berthet and Rigollet, 2013a,b; Ma and Wu, 2013; Krauthgamer et ah, 2013; Arias-Castro 
and Verzelen, 2014; Zhang et ah, 2014; Chen and Xu, 2014; Gao et ah, 2014; Hajek et ah, 2014; 
Wang et ah, 2014; Cai et ah, 2015). Under various computational hardness hypotheses on problems 
like planted clique detection, these works quantify the gap between the information-theoretic limits 
and the statistical accuracy achievable by polynomial-time algorithms. For this purpose, their proofs 
are based on polynomial-time reductions from hard computational problems to statistical problems. 
In contrast with these works, we focus on the statistical limits of a broad class of convex relaxations 
rather than all polynomial-time algorithms. Correspondingly, our theory does not hinge on unproven 
computational hardness hypotheses, and our proof is based on constructions rather than reductions. 
Also, based on another perspective, Chandrasekaran and Jordan (2013) study the tradeoffs between 
computational complexity and statistical performance for normal mean estimation via hierarchies of 
convex relaxations. Their results are based on hierarchies of convex constraints, which are obtained 
by successively weakening the cone representation of the original constraint set. In comparison, our 
results are based on hierarchies of convex relaxations of the optimization problem itself rather than 
the constraints, which are obtained by successively tightening a basic semidefinite relaxation using 
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variable augmentation techniques. In addition, our work is connected to previous works on the SoS 
and other convex relaxation hierarchies (see, e.g., Chlamtac and Tulsiani (2012); Barak and Steurer 
(2014); Barak and Moitra (2015); Meka et al. (2015) and the references therein). In particular, the 
key construction of feasible solutions in our proof is based on the dual certificates for the maximum 
clique problem proposed by Meka et al. (2015). 

The rest of this paper is organized as follows. In §2 we introduce the statistical models. In §3 we 
present the SoS hierarchy of convex relaxations and apply it to estimate the models in §2. In §4 we 
establish the main results and lay out the proofs in §5. In §6 we conclude the paper. 


2 Statistical Model 

In the sequel, we briefly introduce the statistical models considered in this paper. Then we present 
several common estimators for them. 


2.1 Sparse Principal Submatrix Estimation 

Let X G be a random matrix from distribution P and E(X) = 0. We assume there exists an 
index set 5* C {1,..., d} with |5*| = s* that satisfies ©jj = f3* for i ^ j and {i, j) £ S* x S*, while 
Qij = 0 for i ^ j and (f, j) ^ S* x S*. Here /3* > 0 is the signal strength. For all i < j, we assume 
that Xij’s are independently sub-Gaussian with E(Xjj) = Bjj and \\Xij — Qi,j\\^p 2 ^ I- addition, 
we assume that Xi^i = 0 and Xij = Xj^i. We aim to estimate the signal strength (3*. For simplicity, 
hereafter we assume s* is known. We denote by V{s*,d) the family of distribution P’s satisfying the 
above constraints. 

This estimation problem is closely related to the problems considered by Shabalin et al. (2009); 
Kolar et al. (2011); Butucea and Ingster (2013); Butucea et al. (2013); Ma and Wu (2013); Sun and 
Nobel (2013); Cai et al. (2015). These works consider the detection problem and the recovery of S*, 
while we consider the estimation of signal strength. Besides, we focus on symmetric X for simplicity. 
We consider the following estimator for (3* proposed by Butucea and Ingster (2013), 


^ ^ (i,i)65x5 


( 2 . 1 ) 


where |5| is the cardinality of set S. The intuition behind is to exhaustively search all principal 
submatrices of cardinality s* and calculate the average of all entires within each principal submatrix. 
In §4 we will prove that attains the information-theoretic lower bound for estimating f3* within 
V{s*,d) under a challenging regime where s* = o[(d/\/log d)^/^]. Nevertheless, it is computationally 
intractable to obtain (3^^^^. In §3 we will introduce convex relaxations of (3^^^^^. We also consider the 
following computational tractable estimators 

^avg ^ ^^ V Xi= max W ,■ (2.2) 

^ s*{s* - 1) ^ ^ ^ 

for further discussion in §4. 
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2.2 Stochastic Block Model 


We consider the estimation of edge probability in a dense subgraph with s* nodes planted within an 
Erdos-Renyi graph with d nodes. If a pair of nodes are within the subgraph, they are independently 
connected with edge probability (3* G [0,1]. Otherwise, they are independently connected with edge 
probability /3* G [0,/3*]. We denote V{s*,d) to be the distribution family of graphs which satisfy the 
above constraints and by A G the adjacency matrix. We assume Ai^i = 0 for all i G {1, ... ,d} 
and s* is known. Similar to principal submatrix estimation, we focus on the challenging regime with 
s* = o[{d/ log d)'^^^]. Additionally, we assume log{d/s*)/ = o(l) so that s* is not too small. 

This estimation problem is connected to the problems studied by Kucera (1995); Coja-Oghlan 
(2010); Bhaskara et al. (2010); Fortunato (2010); Decelle et al. (2011); Mossel et al. (2012, 2013); 
Verzelen and Arias-Castro (2013); Arias-Castro and Verzelen (2014); Massoulie (2014); Hajek et al. 
(2014); Chen and Xu (2014); Meka et al. (2015). However, we mainly focus on estimating the edge 
probability of the dense subgraph rather than detection or recovery of subgraphs. Also, we assume 
that the dense subgraph and its size are fixed rather than random as in some of the existing works. 
To estimate (3*, we use and defined in (2.1) and (2.2) with Xij replaced by Aij. Though 
stochastic model estimation is closely related to sparse principal submatrix estimation, in §4 we will 
illustrate that the respective upper and lower bounds have subtle differences because of the different 
deviations of Bernoulli random variables and general sub-Gaussian random variables, which possibly 
have unbounded support. 

3 Convex Relaxation Hierarchy 

In this section, we first introduce some specific notations which will greatly simplify our presentation. 
Then we introduce the SoS hierarchy for defined in (2.1). 

Notation: We dehne a collection C to be an unordered array of elements, where each element can 
appear more than once. For instance, {1}, {1,2} and {1,1} are all collections. Let the summation 
between two collections be the combination of all elements in them, e.g., for Ci = {1,2}, C 2 = {1,3} 
we have Ci +C 2 = {1,1, 2,3}. Note that a collection is different from a set, because a set has distinct 
elements. Let the merge operation M(-) on a collection be the operation that eliminates the duplicate 
elements and outputs a set, e.g., for C = {1,1, 2, 2,3} we have M{C) = {1, 2, 3}, which is a set. We 
use \C\ and |5| to denote the cardinality of a collection and a set. Also, we denote by Ci = C 2 if they 
contain the same elements. For integer .£ > 0, we define d^^'^ = Yli=o notational simplicity. 

Note that in (2.1) can be reformulated as 

= max where V.* = |v : v G {0,1}^ (3.1) 

vev,,* s*(s* - 1) ( ^ J 

Because (3.1) involves maximizing a convex function subject to nonconvex constraints, it is compu¬ 
tational intractable to solve. Note that v'^^Xv = tr(Xvv''^) in (3.1). We can reparameterize vv"'" to 
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be a d X d positive semidefinite matrix with rank one. For notational simplicity, we define 


Y = 


0 0 


>dxl 


Ixd 

X 


, n=(i,v^)^(i,v^) = 


1 

Ifo,! • 


ni,o 

ni,i . 

• • Ifl 

nd,o 


• • n.d,d 


, ^^0 = 1. (3.2) 


Here Y, H E Rid+i)>cid+i) denotes a dix d 2 matrix whose entries are all zero. Meanwhile, 

note that defined in (3.1) can be reformulated as 


Vs* = < V : Vi = s*, vf — Uj = 0 for all i E {1,..., d} >. (3-3) 

i=l ^ 

According to the reparametrization in (3.2), it holds that Hjj = ViVj for all i,j E {0, ... ,d}. Hence, 
from (3.1) we obtain the following semidefinite program 


max 

n 


tr(Yn) 
s*(s* - 1)’ 


d 

subject to = •s*, Hq^o = 1) n ^ 0, 

i=l 


^i,i — ffijO j £ {0) f) • ■ • ) d }, 


(3.4) 


in which Yli=i ^ 1,0 = s* corresponds to Yli=i = s*, Hjj = lij^i corresponds to ViVj = VjVi, while 
Hj^j = Hj^o corresponds to vf — Vi = 0. Note that if rank(n) = 1, then from our reparametrization 
in (3.2), the maximum of (3.4) equals the maximum of (3.1). However, we drop this rank constraint 
since it is nonconvex, and hence (3.4) is a convex relaxation of (3.1). 

The SoS hierarchy is obtained by increasingly tightening the basic semidefinite program in (3.4) 
using variable augmentation techniques. In particular, the reparametrization in (3.4) only involves 
the second order interaction between Vi and vj. For integer £ > 1, we consider a d^^^ x dV) matrix 11V), 
where d^^^ = Yl[=o d^ in our notations. For notational simplicity, we index the entries of using 
collections Ci and C 2 with |Ci|, IC 2 I < i, whose elements are indices 1,..., d. Our reparametrization 
takes the form 


^cic ,=n n = n 

2£Cl 2 GCiH-C2 


(3.5) 


In particular, for C 

tr (YV)nV)) 
max-;--—, 


0 we define Oiec ~ level SoS relaxation of (3.1) takes the form 

d 

subject to X]^W+Ci,C 2 = IO| < - 1, IC 2 I < (3.6) 

i=l 

ng.)+C.A=nW+&A' foralHe{l,.,,,rf}, |Ci|<f-2, ICjISf, 
A?,A=F;U’ forallC,+e2=C',+C5, |Ci|,|C2|,|Ci|,|C5|<<, 

H^^^ — 1 nV) 0 
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where G jg defined as 


yW = 


In (3.6), the hrst constraint corresponds to the reparametrization in (3.5) and 


0 

Olxd • 

■ ■ ^Ixd^ 

0 

.. 

X 

X 

■ ■ Orfxd^ 


£^d^xd ■ 



iec 


d 

E' 

i=l 


Vi \ = S 


Vj, for all \C\ <2i — 1, 


j&C 


which is equivalent to = '5* ™ (3.3). The second constraint corresponds to (3.5) and 


n 


Vi ■ Vi = 


Vj ■ Vi, for all \C\ <21 — 2, 


j&C jec 

which is equivalent to vf — Vi = 0 in (3.3). The third constraint corresponds to (3.5) and 


n 


Vi = 


n 


Wj, 


forallCi+C2=C( + C^, \Ci\,\C2\,\C[\,\C'^\ <L 


JSC1+C2 


jeci+c' 

r(^) 


00 = 1 follows from (3.5) and our definition that Oiec Vi = 1 for C = 0. 


The last constraint that 11 
For £ = 1, (3.6) reduces to the basic semidefinite relaxation in (3.4). We denote by the maximum 
of (3.6). We have 

< ... < < ... < gw < gw , 

since we have more constraints in (3.6) for a larger £. Thus, for a larger £ (3.6) gives a tighter convex 
relaxation of (3.1). Meanwhile, note that the semidefinite program in (3.6) can be solved in 
operations. Hereafter we focus on the settings where £ does not increase with d. 

Laurent (2003) proves that other existing convex relaxation hierarchies, such as Sherali-Adams 
and Lovasz-Schriiver hierarchies as well as their extensions, are weaker than the SoS hierarchy in the 
sense that where /S^ther denotes the £-t\\ level of other weaker hierarchies. Note 

that relaxing constraints and objectives in the convex relaxations also leads to looser approximations 
of Hence, we denote by the class of estimator /3’s that fall in the £-ih. level of the SoS and 

weaker hierarchies, as well as their weakened versions obtained by relaxing constraints and objectives. 
By this definition, we have C • • •. For example, for £ > 1 we can drop constraints in (3.6) 
to obtain (3.4), which corresponds to = 1. In particular, from (3.1) we have 

v''"Xv . u'^'Xv . u''"Xv tr(Xri) 


= max 


veVs* s*{s* — 1) 


< max 


,vev,* s*(s* - 1) 
d 


< max 

u,vev,* s*(s* - 1) 


< max 


where V,* = < v 


: Vi = s*, Vi > 0 for alH G {1 

i=l 
d 

a 

i=l 


rJeWa* s*(s* — 1) ’ 
,d} 


(3.7) 






> 0 for all i, j G {1,..., d} 
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Here Vs* is a linear relaxation of Vs*. Note that the right-hand side of (3.7) equals s*/ (s* — 1) • 
where is dehned in (2.2). Therefore, s*/{s* — 1) • can be viewed as a linear programming 
relaxation of which falls within (see, e.g., §2 of Chlamtac and Tulsiani (2012) for details). 

In addition, it is worth noting that the SoS hierarchy has several equivalent formulations. See, e.g.. 
Theorem 2.7 of Barak and Steurer (2014) for a proof of such equivalence. 

4 Main Result 

As defined in §3, denotes the l-th level of the convex relaxation hierarchy for defined in 
(2.1). For stochastic block model, we replace X in (2.1) with the adjacency matrix A respectively. 

4.1 Sparse Principal Submatrix Estimation 

In the following, we present the main theoretical results for estimating the signal strength of sparse 
principal submatrix. In the sequel we establish the information-theoretic lower bound for estimating 
(3* within the distribution family V{s*,d) defined in §2.1. 

Theorem 4.1. For all estimators (3 constructed using X IP ^V{s*,d) and s* = o[{d/^/^o^)^^^], 
there exists an absolute constant C > 0 such that 

inf sup Ep\p - |3*\>C^/T/s*~^log(d/7). 
p PeP(s*,d) 

Proof. See §5.1 for a detailed proof. □ 

In Theorem 4.1 we consider a challenging regime. More specihcally, a straightforward calculation 
shows that defined in (2.2) achieves the d/{s*Y rate of convergence. For s* = o\{d /\/log d)^/^], 
we have y^l/s* • log(d/s*) = o[(i/(s*)^]. Thus, there exists a gap between the rate attained by (3^^^ 
and the information-theoretic lower bound. We will show that there is also such a gap for The 

next proposition shows in (2.1) attains the information-theoretic lower bound in Theorem 4.1. 

Proposition 4.2. For dehned in (2.1) with Ajj being the (i, j)-th entry of X ~ P G P('S*, d), 

we have that 


l^scan _ ^*1 < Cy/l/s* -logid/s*) 

holds with probability at least 1 — 1/d for some absolute constant C > 0. 

Proof. See §5.1 for a detailed proof. □ 

Theorem 4.1 and Proposition 4.2 show that is statistically optimal under the regime where 
s* = o[(d/-v/log d)^/^]. However, it is computationally intractable to obtain 'pj^ns, we consider 

the family of convex relaxations of within the Ath level SoS and weaker hierarchies as well as 
their further relaxations, which is denoted by In the sequel, we establish a minimax lower bound 
for the statistical performance of all estimators within Recall that V{s*,d) is the distribution 
family defined in §2.1. 
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Theorem 4.3. We assume s* = o{ [d/(logd)^]^^^^}. There is an absolute constant C > 0 such that 

inf sup Ep I /? — /?* I > C. 

p&V.W pep(s*,d) 

Proof. See §5.1 for a detailed proof. □ 

Note that the regime considered in Theorem 4.3 is within the challenging regime considered in 
Theorem 4.1. Under this regime, Theorem 4.3 proves that any estimator within the convex relaxation 
hierarchy fails to attain a statistical rate that decreases when s* is increasing. A comparison between 
Theorems 4.1 and 4.3 illustrates that there exists a gap of \/^ (ignoring the logd factor) between 
the information-theoretic lower bound and the statistical rate achievable by a broad class of convex 
relaxations. In other words, to achieve computational tractability via convex relaxations, we have to 
compromise statistical optimality. 

It is worth noting that this gap between computational tractability and statistical optimality is 
effective under the regime s* = o{ [d/(logd)^] which shrinks as £ increases. However, I cannot 
increase with d and s* , because otherwise the computational complexity required to solve the convex 
relaxations increases exponentially, according to our discussion in §3. For i being any constant, the 
regime in Theorem 4.3 is a nontrivial subset of the regime in Theorem 4.1. As will be shown in our 
proof, s* = o{ [d/(logd)^]^^^^} is a sufficient condition to establish the feasibility of the constructed 
solution. In fact, for i = 2,we can further relax this condition to s* = log d) with the results 

of Deshpande and Montanari (2015). Under the regime in Theorem 4.3, the next proposition shows 
that defined in (2.2) is nearly optimal under computational tractability constraints. 

Proposition 4.4. For in (2.2), where Xij is the (i, j)-th entry of X ~ P G ^( 5 *, d), we have 

holds with probability at least 1 — 1/d for some absolute constant C > 0. 

Proof. See §5.1 for a detailed proof. □ 

According to (3.7) and the discussion in §3, we have g piP C 7^1^) • • •. Thus attains 
the minimax lower bound with computational constraints in Theorem 4.3 for every ^ up to a logd 
factor, which also suggests that the lower bound in Theorem 4.3 is tight. Meanwhile, note that the 
calculation of in (2.2) requires O(d^) operations, which is linear in the size of input. In contrast, 
tighter approximations in the Uth level SoS hierarchy require 0{dPP'>^ operations. In practice, such 
a computational complexity is in general higher than the complexity for calculating Theorem 

4.3 indicates that this extra computational cost can only result in limited possible improvements on 
the statistical rate of convergence, i.e., a logd factor. 

It is worth noting the gap between the lower bounds in Theorems 4.1 and 4.3 vanishes when s* is 
a constant that does not increase with d. In this case, achieves the information-theoretic lower 
bound in Theorem 4.1. On the other hand, is computational tractable to obtain in this case. 



4.2 Stochastic Block Model 


In this section, we present the main theory for edge probability estimation in stochastic block model. 
Recall that V{s*, d) is the distribution family defined in §2.2. The following lemma establishes the 
information-theoretic lower bound for estimating (5*. Recall (3* denotes the edge probability of the 
large Erdos-Renyi graph with d nodes. 

Theorem 4.5. For s* = o [(4/Vlogand log(d/s*)/(s*/3*) =o(l), there is an absolute constant 
C > 0 such that 

inf sup Ep|/3 — /3*\> C^/ljs* ■ log((i/s*). 

p PeP(s*,d) 

Proof. See §5.2 for a detailed proof. □ 

Theorem 4.5 is similar to Theorem 4.1 but needs an extra condition that log{d/s*)/[s* =o(l), 
which ensures s* is not too small. Recall each entry of the adjacency matrix A is Bernoulli. Arias- 
Castro and Verzelen (2014) shows that a larger s* guarantees the moderate deviation of the Bernoulli 
distribution is in effect in the lower bound. Next, we prove achieves the information-theoretic 
lower bound in Theorem 4.5 and hence is optimal. 

Proposition 4.6. For defined in (2.1), we have that with probability at least 1 — 1/d, 

l^scan _ ^*1 < C^l/S* -logid/s*). 

Proof. See §5.2 for a detailed proof. □ 

The next theorem establishes the minimax lower bound on the statistical performance of convex 
relaxations within defined in §3. 

Theorem 4.7. For s* and d sufficiently large and s* = o{ [d/(log d)^] we have 

inf sup Ep|/3 —/3*| > 1/4. 

F£P{s*,d) 

Proof. See §5.2 for a detailed proof. □ 

Similar to Theorem 4.3, Theorem 4.7 shows the gap between statistical optimality and computa- 

^ d?-d 

tional tractability. Note that f3* G [0,1]. Meanwhile, it is easy to show E(/3™’^^ = 1) >1—(l —/3*) ^ . 
Therefore, exactly attains such a minimax lower bound under computational constraints up to 
constants. From another point of view, for s* = o{ [d/(log d)^] every estimators within is 
at most as accurate as the trivial estimator /? = 1. 

Theorems 4.3 and 4.7 are similar. Note that for sparse principal submatrix estimation we consider 
sub-Gaussian entries, while in the adjacency matrix for stochastic block model each entry is Bernoulli. 
A direct way to establish Theorem 4.3 is to adapt the construction of P in the proof of Theorem 4.7, 
since Bernoulli is sub-Gaussian. However, as illustrated in §5.1 the information-theoretic lower bound 
in Theorem 4.1 is established using the construction of P with unbounded support. Gorrespondingly, 
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we use a construction of P with unbounded support to establish the lower bound with computational 
constraints in Theorem 4.3. By matching the constructions of P G V{s* ,d) in the proofs of Theorems 

4.1 and 4.3, we can sharply characterize the existence of the gap particularly for sub-Gaussian 
distributions with unbounded support. 

5 Proof of Main Results 

In the sequel, we present the proofs of the main results in §4. We first lay out the proofs for sparse 
principal submatrix estimation, and then the proofs for stochastic block model. 

5.1 Proof for Sparse Principal Submatrix Estimation 

Before we establish the proof of Theorem 4.1, we present a corollary of Theorem 2.2 of Butucea and 
Ingster (2013). Let /3 be a quantity that scales with s* and d. It establishes the sufficient conditions 
under which distinguishing /?* = 0 and (5* = (5 \s impossible. Recall V{s*, d) denotes the distribution 
family specified in §2.1. 

Corollary 5.1. We consider testing Hq : /3q = 0 against Hi ■. (31 = p. For any test p : —)• {0,1} 

based on X, if {s*Y/d"^ = o{l) and limsup,5^s*/log((i/s*) <C, there exist Pq, Pi €7^(5*, d), which 
correspond to Hq and Hi, such that 

infmax{Po((?i = 1), Pi((/> = 0)} > 1/4. 

4 > 

Here C > 0 is an absolute constant. 

Proof. Theorem 2.2 of Butucea and Ingster (2013) gives a similar result for X with Gaussian entries. 
Therefore, their Pq and Pi fall within V{s*,d) specified in §2.1 up to rescaling of variance. Besides, 
it is worth noting that Butucea and Ingster (2013) do not assume X is symmetric. Nevertheless, the 
proof for symmetric X follows similarly from their proof. □ 

Equipped with Corollary 5.1, we are now ready to prove Theorem 4.1. 

Proof of Theorem 4-T We consider testing hypotheses Hq : Pq = 0 and Hi : P\ = p with 

P = Ci/l/s* ■ \og{d/s*), (5.1) 

where C is an absolute constant that is sufficiently small. By Corollary 5.1, there exist Pq, Pi G V{s*,d) 
corresponding to Hq and Hi, such that for any test p : —>• {0,1}, 

inf max{Po((/ = 1), Pi((/ = 0)} > 1/4, for P{s*)‘^/d = o(l). (5.2) 
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We consider a specific test 
have 


based on j3, which is defined as (p0) = l(/3 > From (5.2) we 


infmax|Po(|^-/3o| > ^/2), (1| > /?/ 2 )} 

>P/2), Pi(|^-^| > 


= inf max 

0 

> inf max 


{Po[-^(^) = 1], lPi[0(^ =0]} >infmax{Po(</> = l), IPi(<^ = 0)} > 1/4. (5.3) 


Here the first inequality holds because under Ho, = 1 implies |/3 — /3 q| > ^/2 by definition and 
under Hi, 0(3) =0 implies |/3 — /3|| > 3/2. Here the second last inequality holds because 0(3) is a 
specific class of tests. Consequently, we have 


inf sup 

0 ¥£V{s*,d) 


Ep 13 — 3* I > inf max 

0 


{epJ3-3S|, EpJ3-3i1} 


> 3/2 • inf max 

0 


{Eo(|3-3S| >/3/2), Pi(|3-3i1 >3/2)} >3/8, 


(5.4) 


where the second inequality is from Markov’s inequality and the last is from (5.3). By plugging (5.1) 
into (5.4), we reach the conclusion. □ 


In the sequel, we prove the upper bound in Proposition 4.2. 

Proof of Proposition f.2. For integer s > 0, we denote by Vs the set of v E with exactly s entries 
being one and the others being zero. By definition, in (2.1) we have 

sup V)) Wj = sup v>Xv/2. (5.5) 

5c{i,...,d} ’ vev„* 

I5|=s* hj)e5x5 

Recall that by our definition we have = 0 for alH E {1,..., d} and EX = 0. Note that 


sup v^Xv — sup v>0v < sup |v>(X —0)v|. (5.6) 

VSV^* VSVa* VSVs* 

Since X ~ P E 'P(s*, d), for any hxed v E Vs*, v''~(X — 0)v is twice the summation of s*(s* — l)/2 
independent sub-Gaussian random variables that have mean zero and '02-norm at most one. Hence, 
for any fixed v E Vs* we have 

P[|v>(X - 0)v| >t]< expjl - Ct‘^/[s*{s* - 1)]}. 

Then by union bound, we have 


sup |v>(X - 0)v| > t 
veV.* 


< ( expjl - Ct'^/[s*{s* - 1)]} 

< expjl — Ct‘^/[s*{s* — 1)] + s* log(d/s*)}. 
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Setting the right-hand side to be <5, we obtain 


t = C \/log(e/5) -|- s* log(d/s*) • s*{s* — 1 ). 

Plugging (5.7) into (5.6), we have that with probability at least 1 — 5, 


(5.7) 


sup v'''Xv — sup v''^0v 


vev. 


vev. 


< C\/log{e/6) + s* log{d/s*) ■ \/s*{s* — 1). 


Note that sup^gy^^ v'''0v = s*{s* — 1) • /?*. Then by (2.1) and (5.5) we obtain that 
l^scan _ ^*1 < Cy/\og{e/6) + S* log{d/ S*) /y/ S* {s* - 1) 
holds with probability at least 1 — 5. Setting 6 = 1/d, we reach the conclusion. 

In the following we prove Theorem 4.3. 


□ 


Proof of Theorem 4-3. In this proof, we focus on specihc distributions in V{s* ,d) with (5* = D. We 
consider Wj’s (i < j) being sub-Gaussian random variables which satisfy the constraints in §2.1. In 
addition, we assume that |Wj| > almost surely and ¥{Xij > 0) = ¥{Xij < 0) = 1/2 for all i < j 
and constant i' > 0. Under such a distribution, we construct a matrix G ^ which is a 

feasible solution to the Gth level SoS program in (3.6) with high probability. We further prove that 
the objective value corresponding to is larger than u, which indicates that the maximum of the 
corresponding SoS program is at least with high probability. In the rest of this proof, we denote 
X -|- • Irf to be X. 

Hereafter, we denote by X^ 5 / the submatrix of X whose row indices are in S and column indices 
are in S'. For notational simplicity, we define the expansivity rj{S, X) of some set S C {I, ... ,d} to 
be the number of sets 5' C {I,..., d} that satisfy |5'| = 2£, S T S' and sign (Xs'^s') = l 2 £, 2 £- Here 
sign(X) is a matrix that satisfies [sign(X)]jj- = 1 if Xij > 0 and [sign(X)]jj' = 0 if Xij < 0. Note 
that r/(5,X) is nonzero only if Xij > 0 for all i € S,j € S. Hence, ry(5,X) gives the number of X’s 
submatrices that are extended from and have size 21 x 21 with all entries being positive. It is 
worth noting that by dehnition r]{S,X) is a random quantity, which depends on the random matrix 
X. Recall that each entry of are indexed by collections Ci and C 2 , and M{Ci) and M(Ci) 

are the respective sets, which have distinct elements. Based on the construction of dual certificates 
of Meka et al. (2015), we construct as 


H, 


P) _ 

Ci,C2 - 


7?[M(Ci) U M(C2),X] - |M(Ci) U M(C2)|]! 


7?(0,X) 


(2£)!/[2^- |M(Ci)UM(C 2 )|]! 


(5.8) 


Now we verify defined in (5.8) satishes all the constraints of the Uth level SoS program in (3.6). 
First, we have n 0 0 = 1 from (5.8). Also, satishes c' -I-C 2 = C'l +C 2 ’ since 


M(Ci) U M(C2) = M(Ci + C 2 ) = M(C; + C' 2 ) = M(Ci) U MiC'^) 
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by the definition of the merge operation M(-). Meanwhile, it holds that for 

all Cl and C 2 with \Ci\ < £ — 2 and IC 2 I < i, since in (5.8) we have 

M(Ci + {i, i}) U M(C2) = M(Ci + {i}) U M(C2). 

Now we prove that Yli=i holds for all \Ci\ < £—1 and IC 2 I < £■ Let C = Ci +C 2 , 

which satisfies |M(C)| < \C\ < 2i — 1. By (5.8) we have 

VnM ^ ^[M(C + {i}),X] »-!/[8» - |M(C + {i})|]! 

c.+WA ^ ,,(0_x) ' (2<)!/|2<-|M(C + {i})ll!’ ^ ' 

where we use the fact that 


M(Ci + {i}) U M{C 2 ) = M{Ci +C 2 + {i}) = M(C + {z}). 


Also, note that M{C + {i}) = M{C) for i G M{C). In addition, it holds that M{C + {i}) = M{C)L){i} 
and \M{C + {z})| = \M{C)\ + 1 for i ^ M{C). From (5.9) we have 


d 


En 


C\+{i},C2 


2 = 1 


??[M(c),x; 


(i) 

--, 

- |M(C)|]! 

■ {2£)\/[2£-\M{C)\\\ 


+ E 


7?[M(C)U{i},X] sN/[s* - |M(C)| - 1]! 
7?(0,X) ' {2£)l/[2£-\M{C)\-l]\- 

V 

(ii) 


(5.10) 


Now we characterize the relationship between r/[M(C),X] and 77 [M(C) U {z}, X] with z ^ M{C). Let 
5i, ^ 2 ,..., X] ^ • • •) be the distinct sets satisfying |5j| =2£ — |M(C)|, M(C) n 5j = 0 

and sign(X 5 .uA^(C), 5 jUM(c)) = l 2 £x 2 £ for all j E {l,..., 77 [M(C), X] }. By setting 5** = 
we have that 


_ _ »?[M(C),X] 

r 7 [M(C)U{z},X] = J]r 7 [M(C)U{z},X] J] l(i E 5,) 

i^M{C) iScS# iScS# i=l 

r,[M{C),%] r^[M(C)X] _ 

= E E |5,|=r?[M(C),X].[2£-|M(C)|]. 

i=i *6*5# i=i 

Here the first equality is from ? 7 [M(C) U {z},X] = 0 for z ^ 5^, since in this case 

sign {^M{C)U{i},M{C)U{i}) / l|M(C)U{i}|,|M(C)U{j}|' 

The second equality holds because to calculate 7][M{C) U {z}, X] , we only need to count the number 
of 5j’s that include z. The last equality is from \Sj\ =2£ — \M{C)\. Therefore, for term (ii) in (5.10) 
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we have 


E 


r?[M(C)U{i},X] s*\/[s*-\M{C)\-1]\ 


77 ( 0 , X) {2i)\/[2i-\MiC)\-l]\ 


(5.11) 


ri[M{C),X] .-!/[.*-|M(C)|-1]! ^[M(C),X] .-!/[.*-|M(C)| -1]! 

77 ( 0 , X) ^ ^ (2£)!/[2£-|M(C)|-1]! r?( 0 ,X) (2£)!/[2£ - |M(C)|]! ' 

Meanwhile, for term (i) in (5.10) we have 

^ r,[MiC)X\ - |M(C)||! ^ i)[M(C).X] s*!/|s- - |M(C)|1! 




77 ( 0 , X) (2£)!/[2£-|M(C)|]! 


77 ( 0 , X) (2£)!/[2£-|M(C)|]! 


= (|M(C)| - .*) • - \M{m ^ V[M{C),X] s*l/[s* - \M{C)\]\ 


77 ( 0 ,X) (2^)!/[2£-|M(C)|]! 

77 [M(C),X] s*\/[s*-\M{C)\-1]\ r 7 [M(C),X] s*\/[s* - \M{C)\]\ 

\ s 


77 ( 0 ,X) (2^)!/[2^-|M(C)|]! 

(5.12) 


77 ( 0 ,X) (2^)!/[2£-|M(C)|]! - 77 ( 0 ,X) (2^)!/[2£-|M(C)|]!' 

Plugging (5.11) and (5.12) into (5.10), we obtain 


En, 

i=l 




r]{0,%) ■ (2^)!/[2£-|M(C)|]! " 

Thus, we conclude that satisfies all the constraints of the £-th level SoS program in (3.6) except 
nW E 0. We defer the verification of this constraint to the end of the proof. Next we calculate the 
value of objective function corresponding to Note that 


„[M(C),X] s*!/|s-- |M(C)|]! _ ,fl 

— S i-i/" 


d d d 

E ■ "( 0 , 0 ) = E {Xi.i) ■ ng_y, = X ,,,. > 0 ) . ng I 

i,j=l *,J=1 *J=1 


where the first equality holds because by the definition of •), it holds 77 ({ 7 , j}, X) =0 for Xij < 0, 
which implies n|^| = 0 correspondingly. Moreover, we have 


d d 


(p 


E ’’{( 1 , 0 ) “EE ’’{(>,o) 


E »*n 0 .o) = "• E "oE = '• ■ = (“•)! 


r(^) 


(E 


i,j=i 


j=l i=l 


i=i 


i=i 


where the third and second last equalities are from the constraint Yli=i 
the last is from 110 0 = 1. Similarly, we have 

{ddd “ {d,0 “ *2i,0 - * > 

i=l i=l 

where the first equality follows from the constraints and '^^clc 2 ~ 


^d tt{^) 


Cl + C 2 = Cl + C 2 , and the second is from J2i=i ^Ci+{i} C 2 


= Recall that |Xijj > u almost 
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E E > ») ■ n 


surely and the objective function is equivalent to 

d ^ d 

_ r(^) — 

s*{s* - 1) 

*,.7 = 1 

s*(s* - 1) b 

hj 




s*{s* - 1) 

*,J=1 


{d,{j} s*(s* — 1) ^ {d,{d 

'' ' 2=1 


En 


(^) 




_ _JjW _ _ 

5*^5* — 1) ^ {^}’{^} 5*(s* — 1) 


> ly. 


Hence, the objective value corresponding to 11^) is u. Because /3 G "22^) is the maximum of the £-th 
level SoS program or its relaxed versions, so far we obtain 


F0>u \ nb) E O) = 1. (5.13) 

In the sequel, we verify that 11^) ^ 0 holds with high probability. We invoke Lemma 6.3 of Meka 
et al. (2015), which considers a matrix M^) g (i)^Ej=o (j) indexed by sets Si,S 2 C {1,..., d}, 

which satisfies for 5i = M(Ci) and S 2 = M(C 2 ). Their result implies that under the 

distribution within V{s*, d) specified at the beginning of our proof, M^) ^ 0 holds with probability 
at least 1/2 for sufficiently large s* and d, and s* = o{ [d/(logd)^]^'^^^}. Note M^) is a submatrix 
of nb), i.e., 

- ^\C:\CHM{C)\},{C:\CHM{C)\}- 

In other words, we can simultaneously permute the rows and columns of 11^) ^ which are indexed by 
the collection C’s that satisfy \C\ = \M{C)\, to the upper-left corner of 11^). Then M^) is identical 
to such a Yl^j=o (j) ^ Sj=o (j) upper-left submatrix of 11^). Meanwhile, note that by (5.8) we have 

nW, = for all |Ci| = |m(Ci)|, m(Ci) = m(C 2 ). 

Here 11/ ( and 11/ / denote the row and column corresponding to collection C. Thus, for any vector 
u G , we have 


u'^nb)u = u^ 


E 


f 


E “Cl fo 


(L 

,Ci 


LCi:|Ci| = |M(Ci)| \C[:M{C[)=M{Ci) 

[ E ( 

C2 LCi:|Ci| = |M(Ci)| \C[:M{C[)=M{Ci) 

( 


E 


= 


E “c; ng 


Cl 


E E 


uc' 


E 


C2:|C2| = |M(C2)| \C';M(C')=M(C2) / LCi:|Ci| = |M(Ci)| \c[-.M{C[)=M{Ci) 

= n^Mb)n, 


E “c| )n« 


Cl 


(5.14) 


where u G 0 / is indexed by sets and us = '^c-M(C)=s ^c- Thus, from (5.14) and the fact that 

Mb) ^ 0 with probability at least 1/2, we have Ilb) ^ 0 holds with the same probability. Moreover, 
according to (5.13) and our setting that fd* = 0, by Markov’s inequality we have 

e\p- /3*\ > iyF0>i^)> iy-F0> u I nb) ^ 0) •p(nb) e o) > 1/2 • t/ (5.15) 
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for all /3 G and s* = o{ \d /(log d)^] Recall is a positive constant and our construction of 
distributions are within V{s*,d). Hence, we conclude the proof. □ 


Finally, we prove Proposition 4.4. 


Proof of Proposition f.f. We have 


< 


sup Xjj — sup > t 

sup - &ij\ >t)<(f- F{\Xij - Qijl > t), 


(5.16) 


where the last inequality follows from union bound. Since = &i,j, we have E(Xjj — Qij) = 0. 

Moreover, we know that \\Xij — Qi,j\\ip 2 < 1- By the definition of sub-Gaussian random variable, we 
have 


P(|W,i - Qi,j\ >t)< exp(l - (5.17) 

Substituting (5.17) into (5.16), we obtain 

p(|/jmax _ ^*1 > < (f Qxp (l — Ct'^) = exp (l — + 21ogd). (5.18) 

Setting the right hand side of (5.18) to be 1/d, and solving for t, we obtain with probability at least 
1 — 1/d that 

|^max_^*| <c^/logd. 


This completes the proof. □ 

5.2 Proof for Stochastic Block Model 

In this section, we present the detailed proofs of the main results for edge probability estimation in 
stochastic block model. We need the following lemma from Arias-Castro and Verzelen (2014), which 
provides the sufficient conditions under which the hypotheses Hq : Pq = po and Hi : f3l = pi are not 
distinguishable. Recall A denotes the adjacency matrix and V{s*,d) denotes the distribution family 
specified in §2.2. 

Lemma 5.2. We consider testing Hq : = po against Hi : = pi. For any test (j) : —)• {0, 1} 

based on A, assuming (s*)^(pi — Po)/(a/?^'^) = 0(1)1 limsup(pi — po)^s*/[4po(l — Po) log(d/s*)] < 1 
and log{d/s*)/{s*po) = o(l), we have 

infmax{Po(</' = 1); IF’i(</' = 0)} > 1/4, 

4 > 

where PojEi £ 'P{s*,d) are distributions corresponding to Hq and Hi. 

Now we are ready to lay out the proof of Theorem 4.5. 
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Proof of Theorem f.5. The proof strategy is similar to Theorem 4.1. In the sequel, we assume fi* is 
known, since the obtained lower bound implies the lower bound for unknown fi*. We invoke Lemma 
5.2 with pq = (3* and pi = (3* +]3, where 

^ = C^l/s* ■ \og{d/s*). (5.19) 

Then we have that for any test f —)■ {0,1} based on the adjacency matrix A, it holds that 

inf max{Po(</’ = 1)) = 0)} > 1/4, for = o(l) and \og{d/s*)l(s* 13*) = o(l). 

41 

(5.20) 


It is easy to verify the conditions in (5.20) are implied by the conditions of Theorem 4.5 and (5.19). 
Following the derivation of (5.4) in the proof of Theorem 4.1, we consider a specific test based 
on 13, which is defined as 4>0) = 1 (/3 > /3* + ^/2). We have 


inf max 


= inf max 

3 

> inf max 

0 


-f3*o\>P/2), F,{\d-(3l\>P/2)] 

{Po(|^-r| >^/2), Pi(|^-r-^| >^/2)} 

= 1], Pi[^(^ =0]} >infmax{Po(0 = l), Pi(</= 0)} > 1/4, (5.21) 


where the equality is obtained by plugging (3q and /Ij'. The first inequality holds becasuse (piP) = 1 
implies |/1 — /3*| > P/2, and = 0 implies \P — P* — P\ > P/2. From (5.21) we obtain 


inf sup Ep I /3 — /3* I > inf max 
3 Fep{s*,d) 


[e^,\P-P*,\, EpJ^-/3^|} 


> P/2 ■ inf max 

3 


{Po(|^-/3o 


>P/2), Pi(|^-/3*| >/3/2)} >/3/8, 


where P is defined in (5.19), the second inequality follows from Markov’s inequality. This concludes 
the proof. □ 


In the following, we prove Proposition 4.6. 

Proof of Proposition f.6. The proof is similar to Proposition 4.2. We only need to note that A—E[A] 
is a symmetric matrix, whose entires within the upper-right triangle are independently sub-Gaussian 
and satisfy 


\\Aij - EAjj ||.02 < 1 , for all i < j, 

since Ajj is Bernoulli and |Ajj — EAjj| < 1. Then replacing X with A in the proof of Proposition 
4.2, we reach the conclusion. □ 

In the following, we lay out the proof of Theorem 4.7. 
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Proof of Theorem f. 7. We consider a specific distribution in V{s* ,d) under which the edge probabil¬ 
ity/3* = j3* = 1/2. Let A = A-|-Irf. Under such a distribution, we construct a matrix G ^ 

which is a feasible solution to the Uth level SoS optimization problem with high probability. Then 
we prove the objective value corresponding to is one, which implies that the maximum of the 
respective SoS program is at least one with high probability. 

Different from the proof of Theorem 4.3, we define the expansivity rj{S, A) of 5 C { 1 ,..., d} as 
the number of sets 5' C {1,..., d} satisfying |5'| =21, S T S' and = l 2 e, 2 e- Note 7 /( 5 , A) is 

nonzero only if = 1 | 5 |,| 5 |- Therefore, r/(5. A) gives the number of A’s submatrices which are 
extended from and have size 2i x 2i with all entries being one. Recall that each entry 
n(^) are indexed by collections Ci and C 2 , and M(Ci) and M(Ci) are the corresponding sets, which 
have distinct elements. Similar to (5.8), we construct each entry of as 


n, 


p) 

Cl,C 2 


7/[M(Ci) U M{C2),A] s*!/[s* - |M(Ci) U M(C 2 )|]! 


7 /( 0 , A) 


(2^)!/[2£- |M(Ci)UM(C 2 )|]!' 


(5.22) 


Note that the construction of is exactly the same as (5.8), except that we replace X with A. 

Also, by the same calculation as in the proof of Theorem 4.3, we can verify defined in (5.22) 
satisfies the constraints of Uth level SoS optimization problem. 

Next we calculate the value of objective function corresponding to Note that 


i,j=l i,j=l i,j=l 


Here both equalities hold because according to the definition of ?/(•, A), it holds that r/({ 7 , j}. A) = 0 


/£') 

for Aij 7 ^ 1, which implies n|/| = 0 correspondingly. Moreover, we have 


E nS'io, = E E nS'lo, = E E 

i,j=l j=l 1=1 j=l j=l 


« = »•. »-ng' = («•)", 


where the third and second last equalities are from the constraint Yli=i ~ 

the last is from H^ 0 = 1. Recall that the objective function is equivalent to 


s*(s* — 1) ^ 

^ ' i,7 = l 


E - 




s*(s* — 1 ) ^ 
^ ' i,j=l 

= 1 


E TK,, 


Dl — 1 ) 


Eng, 


i=l 


ui 


fs*)^ - s 


s*{s* - 1 ) 

Here the last equality holds because we have 


Enw,w = EnS'U = 


w _ 


rW _ 


2 = 1 


2=1 
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where the first equality follows from the constraints ^Ci+{ii}C 2 ~ C 2 ^c?C 2 ~ 

C 1 +C 2 = C[+ C 2 , and the second is from Yli=i ~ ^*^^CiC 2 ' Therefore, the objective value 

corresponding to is one. Because f3 € is the maximum of the £-th level SoS program or its 
relaxed versions, so far we obtain 


P(,5 > 1 I ^ 0) = 1. (5.23) 

According to the same proof of Theorem 4.3, we have ^ 0 holds with probability at least 1/2 
for s* = o{ [d/(logd)^] Also, according to (5.23) and our setting that (3* = 1/2, from Markov’s 
inequality we have 

E\P-(3*\ > 1/2-Pd^S-/3*| > 1/2) > l/2-P(^> 1 I ^0) •P(n(^) ^0) > 1/4, (5.24) 

for any /3 G and s* = o{ [d/(log(i)^]^'^^^}. Recall that our construction of distribution is within 
V{s* ,d). Hence we conclude the proof. □ 


6 Conclusions 

In this paper, we investigate the statistical limits of convex relaxations for two statistical problems: 
mean estimation for sparse principal submatrix and edge probability estimation for stochastic block 
model. Different from existing works, which consider the statistical limits of general polynomial-time 
algorithms, we instead characterize the loss in statistical rates incurred by a broad family of convex 
relaxations. At the core of our main theoretical results is a construction-based proof, which does not 
hinge on any unproven hardness hypotheses. Our conclusion is that in order to attain computational 
tractability with convex relaxations, under particular regimes we have to compromise the statistical 
optimality. 
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